Diagnosing injection-production system faults in the same well using the rough set-LVQ neural network

This study proposed a reverse calculation model of the unique rod pump injection and production system structures in the same well to diagnose and resolve defects, after which dynamometer diagrams of the system production and injection pumps were drawn. The invariant moment feature method was applied to identify seven such characteristics in the injection pump power graph, establishing a downhole system for fault diagnosis in rod pump injection and production systems in the same well using Rough Set(RS)-Learning Vector Quantization(LVQ). On the premise of keeping the classification ability unchanged, the Self-Organizing Map(SOM) neural network was used to discretize the original feature data, while RS theory was employed for attribute reduction. After establishing the LVQ fault diagnosis subsystem, the reduced decision table was entered for learning and training. The test results confirmed the efficacy and accuracy of this method in diagnosing downhole faults in rod pump injection-production systems in the same well. After comparing the test results with the actual working conditions, it can be seen that the rod pump injection-production diagnosis system based on RS-LVQ designed in this paper has a recognition rate of 91.3% for fault types, strong recognition ability, short diagnosis time, and A certain practicality. However, the research object of fault diagnosis in this paper is a single fault, and the actual downhole fault situation is complex, and there may be two or more fault types at the same time, which has certain limitations.


Introduction
Most oil fields in China have entered the high water cut stage.Excessive water cut increases production costs, and when production exceeds the economic limit, some oil wells are even forced to be abandoned [1,2].Therefore, same-well injection-production rod pumps effectively stabilize oil and water control in extra-high water-cut oil wells.Consequently, production and injection are combined in a single wellbore to successfully reduce the water content in the produced fluid [3].However, the downhole section of this system consists of double injection and production pumps, presenting a more complicated structure than a conventional rod pump system.Failing to timeously identify and address issues adversely affects the production and economic value of oil fields, necessitating effective analysis of the system operational conditions, as well as accurate, reliable fault diagnosis and corresponding measure implementation [4,5].
Since the 1960s, many studies have explored fault diagnosis techniques for pumping wells.Rapid scientific and technological development has prompted the emergence of new disciplines in various fields [6][7][8][9] and the progression of diagnostic pumping well approaches.Sun et al. [10] proposed the "five-finger test" analysis method relies on experienced workers to perceive the vibration of the polished rod with their hands, and then judge the working state of the underground according to their own experience.Although easy to implement, its accuracy was low, and it was gradually replaced by other methods.Li et al. [11] proposed the surface indicator diagram method mainly uses a dynamometer to measure the suspension point diagram and matches it with a standard dynamometer diagram to obtain the downhole working status.However, this method suffers from the following problems: there are too many assumed conditions, and some of the actual measured dynamometer diagrams are too singular in shape, and it is difficult to select the closest standard diagram, thus limiting its range of application.Qiu et al [12] proposed the Downhole dynamometer method using a downhole power instrument directly downhole to measure the dynamometer diagram of the oil pump.The method eliminates many of the uncertain factors and makes the test more accurate.However, all the downhole devices need to be taken out of the ground to install the downhole dynamic instrument, and the installation cost is steep, which is not conducive to practical promotion.Gibbs and Li [13,14] proposed the Computer diagnosis method combines the mathematical model with the damping wave equation and the surface dynamometer diagram to obtain the corresponding downhole pump dynamometer diagram, and uses the dynamometer diagram to implement corresponding comparative analysis.Compared with the downhole dynamometer method, it not only saves the cost, but also improves the accuracy.However, this method still needs to rely on the skills and work experience of the operator, making it difficult to promote its use.Huang [15] proposed the "Artificial Intelligence Diagnosis" proposed by Using computer-aided diagnosis and intelligent techniques to analyze faults in the entire pumping unit, because of its advantages of simple operation and high precision, it has gradually become the current research field and has extensive development space.As shown in Table 1.

Method
Advantages Shortcomings

"Five Fingers Test" Analysis
Mainly rely on experienced staff to feel the vibration feeling of the polished rod in the palm, and use experience to infer the downhole working conditions

Simplicity of operator
Poor accuracy

Diagnosis method of ground dynamometer
Use the dynamometer to measure the suspension point dynamometer map, and then compare, match and interpret it with the standard map to infer the downhole working conditions The type of failure can be more accurately defined There are many assumptions, and the shape of some measured dynamometer diagrams is too singular to select the closest standard diagram, so its application range is limited.

Downhole dynamometer diagnosis method
Install the downhole power meter directly downhole to measure the dynamometer diagram of the oil well pump Many uncertain factors can be eliminated, and the test accuracy is high All downhole equipment needs to be driven out during installation, which is expensive and inconvenient for popularization and application Computer diagnostics Establish a mathematical model based on the damped wave equation, combine the system motion law derived from the surface dynamometer diagram, solve the downhole pump dynamometer diagram, and further apply the dynamometer diagram matching method to judge the downhole working conditions

Save costs, and improve Accuracy
It still needs to rely on the professional skills and work experience of the operator, and there is a certain difficulty in popularizing and applying it

AI diagnostics
Fault diagnosis of oil pumping system by using artificial intelligence method with the help of computer equipment

Easy to operate and high precision
The structural design of AI diagnosis is often determined by the designer based on experience, which has certain limitations. https://doi.org/10.1371/journal.pone.0291346.t001 Usually, the working condition of the well can be preliminarily diagnosed through the suspended point indicator diagram, but if the working condition of the oil well is to be accurately diagnosed, it is best to use the pumping diagram.Since the downhole pump group of this system works at a depth of nearly one thousand meters underground, it is difficult to directly detect its working state parameters, and the method of using a downhole dynamic instrument is also infeasible due to factors such as inconvenient installation and high cost.A reasonable fault diagnosis model and drawing a pump power diagram are the core technologies for fault diagnosis.Through the theoretical calculation of the oil well production and the comparative analysis combined with the field survey, the statistics show that the probability of injection pump failure exceeds 90% when the injection-production system of an oil well fails, while the production pump can continue to operate normally.According to the commonly used method, using the suspension point dynamometer diagram for fault diagnosis, it is impossible to accurately identify and distinguish the fault types of the injection pump and the recovery pump, Therefore, based on an actual site scenario, this paper proposes a reverse calculation method that considers the production pump as a supplementary boundary condition and uses RS-LVQ to establish a downhole technique for diagnosing faults in the rod pump injection and production system in the same well.Therefore, defects are rapidly and accurately identified, allowing targeted measures to be implemented to address the issue [16][17][18], making up for the shortcomings of traditional methods.

The composition of the rod pump injection and production system in the same well
The rod pump injection and production system in the same well mainly consist of a conventional beam pumping unit, a production pump, a sealing piston, a bridge packer, an injection pump, and an oil-water separation system [3], as shown in Fig 1.
The oil enters the settling cup of the oil-water separator, followed by the production fluid in the oil layer, and is rapidly separated into a denser water layer and a less dense low-water oil layer due to gravity.The water in the lower layer flows through the center pipe of the oil-water separation device to the suction port of the injection pump, where it is circulated and reinjected into the injection layer.The concentrated oil in the upper layer flows up to the suction port of the production pump through the central flow channel and is lifted to the ground.

Diagnostic model establishment
The longitudinal vibration equation of the sucker rod string in the rod pump injection and production system in the same well [19] is: where E r is the elasticity modulus of the sucker rod, Pa; A r is the sucker rod cross-sectional area, m 2 ; ρ r is the sucker rod density, kg/m 2 ; and v e is the drag coefficient per unit length.The diagnostic model of the rod pump injection-production system in the same well can be summarized as the initial boundary value problem of the wave equation as follows: where: The initial condition is: The boundary condition is: where c is the transmission speed of the sound wave in the pole column, m/s; L 1 is the length of the upper sucker rod in the production pump, m; L 2 is the length of the upper sucker rod in the injection pump, m; L is the total length of the sucker rod, m; u is the elastic displacement of any sucker rod section, m; x is the distance from the suspension point to any sucker rod section, m; t is the time from the starting point, s; v is the damping coefficient of the well fluid on the sucker rod; and T is the cycle.ϕ(x), ψ(x), Φ 1 (t), Φ 2 (t), and Φ 3 (t) are determined by the working conditions of the pumping system.Establish and solve the static/dynamic load model of the production pump, and obtain the load-displacement function of the production pump.The load-displacement function of the suspension point derived from the suspension point indicator diagram is subtracted from the calculated load-displacement function of the production pump to obtain the load-displacement function of the injection pump so that the working condition diagnosis can be performed according to the injection pump work diagram.

Reverse calculation method
The formula above shows that three boundary conditions must be known when solving the diagnostic model.The displacement and load boundary conditions of x = 0 can be obtained from the suspension point dynamometer diagram, but the boundary conditions of x = L 1 are missing.However, it is difficult to directly detect the dynamometer diagram of the downhole pump with existing technology.Field investigation statistics show that the failure of the injection pump accounts for more than 90% of the total failure, while the production pump works normally.Therefore, a reverse calculation method is proposed, assuming normal production pump operation.This technique is combined with the display difference and iterative methods to solve the fault diagnosis model of the rod pump injection-production system in the same well.The injection pump dynamometer is obtained by solving and drawing the production pump indicator diagram and combining it with the measured dynamometer diagram of the suspension point on the ground.
The reverse calculation method is based on the assumption that the production pump is working normally, the fault diagnosis model of the injection-production system of the rod pump in the same well is solved by the complement grid method to obtain the production pump power diagram, and then the injection pump power diagram is obtained by further solving the collected suspension point diagram.In the normal working state of the production pump, the solution steps of the fault diagnosis model as shown in Fig 2.

Numerical analysis
Solving the model was divided into three sections: the upper sucker rod of the production pump, the upper sucker rod of the injection pump, and the interface.The solution area was divided into several rectangular grids, after which steps h 1 and h 2 in direction x and steps τ and t ¼ T m in direction t are used to determine the grid coordinates as follows: where h 1 and h 2 , τ represent constants exceeding zero, n = n 1 + n 2 .
The upper sucker rod diagnostic models of the production and injection pumps involved the problem of the same rod diameter.The different formats were found in the available literature [20], while the interface was deduced using continuous conditional Formulas ( 8) and (9).
where (F i,j ) 1 , (F i,j ) 2 are the loads at the interface of the upper sucker rods of the production and injection pumps, respectively, N; and F j is the production pump load, N. Formula (8) can also be expressed as: A method obtained from the available literature [20] was used to deduce Formulas ( 9) and (10) as follows: , and subscripts 1 and 2 represent the upper sucker rods of the production and injection pumps, respectively.
The differential equation is expressed as: where The dynamic load at the interface is: 3 Data acquisition and preprocessing

Data acquisition
After solving the diagnostic model of the injection-production system of the rod pump in the same well, the dynamometer diagram of the injection pump was drawn and combined with the actual working conditions on site and summarized as upper valve leakage.The eight fault types include rod breakage, insufficient liquid supply, continuous spraying with pumping, downward impact on the pump, gas impact, lower valve leakage, and simultaneous upper and lower valve leakage.The corresponding fault types are shown in Fig 3 .Furthermore, 770 injection pump indicator diagrams with relatively distinctive features were selected to establish a sample library, as shown in Table 2.

The coordinate digitization of the injection pump dynamometer diagram.
To facilitate the possibility of image feature extraction, obtaining the pixel coordinates of the curve outline is necessary.As shown in Fig 4, the Moore neighborhood boundary tracking algorithm was used to randomly select an initial point P 0 on the curve, search the eight neighborhoods around this point, and select the corresponding point with the smallest difference between the pixel value and P 0 , which is denoted as P 1 .This process was repeated until returning to the initial point, P 0 = P N while storing each curve pixel point, P 0 , P 1 . .., P N .

Normalization of the injection pump dynamometer diagram.
The sizes of the injection pump dynamometer diagrams drawn according to the calculation differed.The sizes and number of points were normalized to compare the feature value extraction.

Size normalization
The mapminmax linear function conversion method was used to normalize the size of the data row by row according to the following calculation formula: When the data of an entire row were equal, that is, X max = X min , the data processing software adjusted this transformation to y = Y min by default.Therefore, both the X max and Y max normalized point coordinates were equal to 1.

Point normalization
The curve points in the injection pump dynamometer diagram were normalized to 300 to obtain the transformed coordinates of each point.

Binarization of the injection pump dynamometer diagram.
A binary image refers to an image in which the pixels are either black or white (gray value = 0 or 1), with no

Hu moment invariant characteristics
The key to injection pump power map recognition is extracting the representative eigenvalues.The invariant moment method can describe the global characteristics of the image shape [21,22].The method is unique, that is, for an image with limited area and continuous piecewise density distribution function, each order moment exists, and its set is the only description of all information of the image.The gray pixel value of a binary image can be represented by a N 1 × N 2 matrix f(x,y), as shown in Fig 8.
where � x ¼ m 10 m 00 , � y ¼ m 01 m 00 , � x; � y ð Þ is the gray centroid of the target area.
The normalized (p+q) order central moment of f(x,y) is defined as: where After algebraic transformation, the following seven two-dimensional invariant Hu moments were obtained: The seven Hu moment invariant features of the binarized injection pump dynamometer diagram were extracted using MATLAB software and stored in a feature vector φ = [φ 1 , φ 2 , φ 3 , φ 4 , φ 5 , φ 6 , φ 7 ] as its identification feature.Some data are listed in Table 3.

Morphological eigenvalues
Morphological characteristics represent key features of the injection pump dynamometer curve and are vital for image recognition.Therefore, to normalize and binarize the injection pump dynamometer diagram, this section established a relationship between an image of known size and a pixel and extracted four parameters, including area, rectangularity, elongation, and Euler number.The morphological feature values of the images in the sample library were extracted using MATLAB software, providing basic data for subsequent diagnosis and identification, as listed in Table 4.

The RS-LVQ model
The rough set-LVQ fault diagnosis model reflected the direct interconnected relationship between the core attribute in the fault feature and the output neuron of the LVQ neural network.The training sample data were analyzed and organized, after which an initial decision table was created, and the continuous attribute was discretized using the SOM neural network.Then, RS theory was employed to reduce the conditional attributes in the discrete data table, obtain the minimum conditional attribute set and kernel, denoting the final decision table, and determine the optimal neural network structure.The K-level cross-validation method was used to determine the optimal number of neurons.The reduced data were entered as samples for learning and training.After stabilizing the LVQ network test, the extracted eigenvalues of the injection pump dynamometer diagram were entered into the network for cluster analysis after discretization to obtain the fault results.
As shown in Fig 9, the structure-loose coupling method was adopted, while the rough set was used as the front-end system for data preprocessing.The final reduced decision table data was imported into the LVQ system for fault diagnosis.

Establishing the decision table.
The extracted feature data was used to establish a decision table with the following conditional attributes: C 1 -C 7 represented the seven Hu moment invariant features of φ 1 -φ 7 , while C 8 -C 11 denoted the area, rectangularity, elongation, and Euler number, respectively.D represented the decision attribute, while 1-8 in the first column denoted the upper valve leakage, broken rod, insufficient liquid supply, continuous spraying and pumping, lower impact pump, air influence, lower valve leakage, and simultaneous upper and lower valve leakage A total in the eight failure conditions.The MATLAB software was employed to extract the data in the decision table from the eigenvalues of the dynamometer diagram of the injection pump in the rod pump injection-production system and shown in the table as part of the original data decision Table 5.

Discretization of the continuous attributes.
The analysis and processing capability of the rough set theory only extends to symbolic attribute value discretization.However, the eigenvalues extracted in Section 4 represent continuous attribute values.Therefore, this data must be discretely processed before using the rough set theory to reduce the fault decision table and extract the rules.

1) Algorithm steps
This paper used the dynamic SOM neural network method to discretize the continuous data attributes according to the following algorithm steps: 1. Raw data table input: DT = (U, C, D, V, f).Establishing the continuous properties: c j (j = 1, 2,� � �, m).
3. The SOM j clusters for c j was established to obtain c r j , j = 1, 2,� � �, m and calculate the α values.
4. If α � α max , skip to 9); Otherwise, Equation v 0 = (v − min c )(max new − min new )/(max c − min c ) + min new was adopted to eliminate the dimension effect and normalize the original C set.

The attribute variance was calculated to obtain
(arranged from small to large).
7. c 0 1 was clustered on the attribute to obtain c r 0 j , and the α value was calculated.
8. If α � α max , the process continued to 9).Otherwise, if j < m, then j ( j + 1, skip to 7).for calculation.If j = m, then j ( 1, r ( r + 1, and the process continued to 7) for calculation.The meanings of other symbols were the same as those shown above.

2) Algorithm implementation
Table 4 shows eight sample vector sets, with each sample containing 11 conditional attributes, which were simulated, studied, and trained.The running effect of the program is shown in Table 6.The network topology and distance between the adjacent neurons are illustrated in Figs 10 and 11, respectively.The clustering results are presented in Table 7.
The analysis of the above table yielded the following conclusions: 1.When the number of network training steps was set to 10, the clustering results of samples 1, 2, 4, 5, 7, and 8 were the same, while those of samples 3 and 6 varied.The samples were preliminarily classified.
2. When the number of network training steps was set to 50, the clustering results were the same between samples 1 and 7, 3 and 6, and 5 and 8, while samples 2 and 4 were equal.However, the classification was not sufficient.
3. When the number of network training steps was set to 60 or even 80, it was impossible to completely distinguish all the samples.When the number of training steps reached 100, the clustering results showed differentiation between all samples.The expected effect was  The MATLAB data processing calculation results were: The remaining conditional properties included the following: C 3 , C 4 , and C 6 represented three Hu moment invariant features, while the two morphological eigenvalues were denoted by C 8 as the area eigenvalue and C 9 as rectangularity.

LVQ neural network construction.
The structural LVQ neural network differences cause variations in its error, performance, and generalization ability.Therefore, accurate diagnostic results depend on designing the neural network to specifically identify defects in the rod pump injection-production system in the same well.
Establishing the number of neurons A set of characteristic parameter data for each dynamometer diagram was reduced and yielded five columns of data (C 3 , C 4 , C 6 , C 8 , and C 9 ), representing five conditional attributes.Therefore, the number of input neurons for the LVQ neural network was set to 5, while the    The results revealed the smallest error and optimal effect at 13 neurons.

Establishing the training parameters
To ensure the accuracy of the classification results, the LVQ1 algorithm was used at a maximum number of iterations was set of 103, a target of 0.005, and a learning rate of 0.5.

Rough set-LVQ neural network training.
After creating the neural network, the parameters could be reset and modified as necessary.After LVQ neural network training using the reduced data, its error performance and regression coordinates were checked, as shown in Figs 14 and 15, respectively.
As shown in Fig 14, the error performance of the diagnostic system decreased with an increase in the number of iterations, reaching 0.031746 at 1000 steps, it indicates that the error value is small, while the correlation coefficient between the target and actual output values was 0.92696, as illustrated in Fig 15 .it shows that the predicted value is very close to the actual value.
These results indicated that the SOM neural network effectively solved the continuous attribute discretization problem in the decision table and allowed subsequent reduction processing.The rough set attribute reduction improved the system diagnosis efficiency and simplified the fault mode recognition LVQ network structure.Since LVQ involved back-end processing, its nonlinear mapping ability was fully utilized, guaranteeing high-precision diagnosis results.

Diagnostic system test
The production data information of multiple oil wells with rod pump injection and production systems were collected, and 770 typical dynamometer diagrams approved by field technicians and experts were selected for testing.The types and number of samples are shown in Table 2.The flow and diagnostic system method was used to evaluate the injection pump system failure, which was compared with the actual pump inspection results, as shown in Table 8.The statistics show that 703 of the 770 dynamometer diagrams were correctly diagnosed, the highest accuracy is 100%, and the lowest is 80%.with a coincidence rate of 91.3%.It can be seen that even the lowest coincidence rate has good accuracy.This verified the excellent adaptability of the intelligent integrated diagnosis system for identifying faults in the rod pump injection-production system in the same well.

Conclusion
The present study establishes a fault diagnosis system for the rod pump injection and production system in the same well, based on RS-LVQ.On the premise of keeping the classification ability unchanged, the SOM neural network is used to discretize the original feature data, and the RS theory is used to reduce its attributes.After the LVQ fault diagnosis subsystem is established, the reduced decision table is input for learning and training.The results of case analysis show that: SOM neural network solves the problem of discretization of continuous attribute values in decision-making systems, and RS attribute reduction can not only improve diagnosis efficiency, but also simplify the LVQ network structure for fault pattern recognition.Strong nonlinear mapping capability ensures the accuracy of diagnosis results.Therefore, the diagnostic system can correctly and efficiently diagnose the faults of the rod pump injection-production system in the same well.

Fig 1 .
Fig 1.A schematic diagram of the rod pump injection and production system in the same well.1.The conventional beam pumping unit.2. The production pump.3. The upper production pump valve.4. The lower production pump valve. 5.The sealing piston.6.The bridge packer.7. The injection production pump.8.The upper injection pump valve.9.The lower injection pump valve.10.The oil-water separator.https://doi.org/10.1371/journal.pone.0291346.g001

Fig 2 .
Fig 2. The solution steps of the fault diagnosis model.https://doi.org/10.1371/journal.pone.0291346.g002 Fig 5 shows the step flow of this algorithm, while Fig 6 presents the power display image of an injection pump after point normalization.

Fig 11 .
Fig 11.A schematic diagram of the distance between the adjacent neurons.https://doi.org/10.1371/journal.pone.0291346.g011 output neurons were set to 8, representing eight fault types.The K-fold cross-validation method was used for network training and verification, while the network error function was defined as the mean square error of the verification data.At the same training step size, the number of neurons in 20 competitive layers was tested from 1 to 20, with a corresponding

Fig 12 .
Fig 12.The flow chart of the rough set attribute reduction algorithm.https://doi.org/10.1371/journal.pone.0291346.g012 Fig 13 shows a structural diagram of the constructed LVQ neural network.

Table 2 . Number of sample banks. Sample No. Fault type Number of samples Sample No. Fault type Number of samples
https://doi.org/10.1371/journal.pone.0291346.t002

Table 5 . The partial raw data decision table.
Output DT p , where α denoted incompatibility, α = 1 − γ C (D), α max was the maximum allowable value of incompatibility, ε signified the constant coefficient (smaller non-negative number), r min represented the initial cluster number, c r j denoted the discrete attribute column, and DT p signified the discretization decision table.